Grammar Index by Induced Suffix Sorting
نویسندگان
چکیده
We propose a new compressed text index built upon grammar compression based on induced suffix sorting [Nunes et al., DCC’18]. show that this exhibits locality sensitive parsing property, which allows us to specify, given pattern P, certain substrings of called cores, are similarly parsed in the whenever these occurrences extensible P. Supported by length m, we can locate all its \(\text {occ}\) T n within \(\mathop {}\mathopen {}{\mathcal {O}}\mathopen {}(m \lg |{\mathcal {S}}| + \text {occ}_C\lg {occ})\) time, where \({\mathcal {S}}\) is set characters and non-terminals, number occurrences, {occ}_C\) chosen core C P right hand side production rules T. Our requires {}(g)\) words space be {}(n)\) time using working space, g sum lengths sides rules. practically evaluate our proposed excels at locating long patterns highly-repetitive texts. implementation available https://github.com/TooruAkagi/GCIS_Index.
منابع مشابه
A Grammar Compression Algorithm based on Induced Suffix Sorting
We introduce GCIS, a grammar compression algorithm based on the induced suffix sorting algorithm SAIS, presented by Nong et al. in 2009. Our solution builds on the factorization performed by SAIS during suffix sorting. We construct a context-free grammar on the input string which can be further reduced into a shorter string by substituting each substring by its corresponding factor. The resulti...
متن کاملIn-Place Suffix Sorting
Given string T = T [1, . . . , n], the suffix sorting problem is to lexicographically sort the suffixes T [i, . . . , n] for all i. This problem is central to the construction of suffix arrays and trees with many applications in string processing, computational biology and compression. A bottleneck in these applications is the amount of workspace needed to perform suffix sorting beyond the spac...
متن کاملFaster suffix sorting
We propose a fast and memory efficient algorithm for lexicographically sorting the suffixes of a string, a problem that has important applications in data compression as well as string matching. Our algorithm eliminates much of the overhead of previous specialized approaches while maintaining their robustness for all kinds of input. For input size n, our algorithm operates in only two integer a...
متن کاملNotes on Suffix Sorting
We study the problem of lexicographically sorting the suffixes of a string of symbols. In particular, we analyze the time complexity of Sadakane’s suffix sorting algorithm [8], showing that this is O(n log n) in the worst case. We also give a small improvement in the space requirements of this algorithm. We conclude that Sadakane’s algorithm, which has previously been shown to outperform the cl...
متن کاملParallel Suffix Sorting
We present a parallel algorithm for lexicographically sorting the suffixes of a string. Suffix sorting has applications in string processing, data compression and computational biology. The ordered list of suffixes of a string stored in an array is known as Suffix Array, an important data structure in string processing and computational biology. Our focus is on deriving a practical implementati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-86692-1_8